1. Dataset loading and Data Exploration

Training Dataset Analysis

Figure shows only 38.4% people could survive from the Titanic Ship.

From the above figure, it seems that there were more Male people rather than Female people.In total, there were more people with the age between 20 and 35 years.

From the above Figure, it seems that people with the age after 30 years are likely to bought the First class ticket where the young people, people with the age around 20 years are likely to bought third class ticket as they do not want to spend much money on the ticket.

Testing Dataset Analysis

Exploratory Data Analysis (EDA) on the Training Dataset (train.csv)

Above figure shows that there is less likely of having a chance of Survival of the people above 40 years of age. However, figure depicts that there are more number of survivals whose age lies between 20 and 40. One of the reason could be: there are more young people compared to Old people intotal in the Titanic Ship.

Above figure shows that there more female survivers compared to male.

Above figure shows that highest number of people survived who had 1st Class Ticket. On the other side, people having 2nd class tickets had a highest possibility of non survival.

Above figure shows the scenario of survival possibility of the people having siblings or spouse. Figure depicts that people having 0 Siblings or Spouse along with them have a high possibility of Survival. This possibility seems going down as the people having siblings or spouse increases.

In above figure as well, the same scenario can be seen as it is in case of number of siblings or spouse aboard in the Ship. People having no Parents or Children along with them had highest possibility of surviving. However, survival possibility is getting low as number of children or parents increases. People having in total 4 (childred + parents) had a 0% possibilty of surviving.

From the above figure, it seems that when the fare cost goes up, there is only a small increase in the chance of surviving.

C = Cherbourg, Q = Queenstown, S = Southampton

From the above figure, it can be seen that people who boarded the Titanic ship from Southampton, England, had the highest chance of surviving. Those who boarded from Cherbourg, France, and Queenstown (Cobh), Ireland, had lower chances of survival.

The above figure shows that the women with titles "Miss" and "Mrs." survived the most. This shows that women were the top priority for survival during the incident compared to Men people.

Now in the Training dataset, there are equal scenario of Survived as '0' and Survived as '1'. The dataset is balanced and hence, it will prevent the Machine Learning (ML) model being a Bias towards the Majority samples.

From above plot, it can be seen that Features 'Age', 'SibSp', 'family_size' and 'title_name_Other_titles' do not contribute much and have very less correlation with the target variable. Hence, these features can be removed from the datset.

Implementing a Machine Learning model

Before making prediction, it is important to check which ML model will make the better prediction. Therefore, below are different classification Models which can be evaluated on the Dataset.

From above figure, it can be seen that among all classifiers, LightGBM classifier exhibits the highest Accuracy and lowest Cross Validation Error. Therefore, LightGBM seems to be the Model for the further analysis.

Prediction on the Testing Dataset (test.csv) using LightGBM as a Machine Learning model